Goto

Collaborating Authors

 nn policy


Symbolic Distillation for Learned TCP Congestion Control

Neural Information Processing Systems

Recent advances in TCP congestion control (CC) have achieved tremendous success with deep reinforcement learning (RL) approaches, which use feedforward neural networks (NN) to learn complex environment conditions and make better decisions. However, such ``black-box'' policies lack interpretability and reliability, and often, they need to operate outside the traditional TCP datapath due to the use of complex NNs. This paper proposes a novel two-stage solution to achieve the best of both worlds: first to train a deep RL agent, then distill its (over-)parameterized NN policy into white-box, light-weight rules in the form of symbolic expressions that are much easier to understand and to implement in constrained environments. At the core of our proposal is a novel symbolic branching algorithm that enables the rule to be aware of the context in terms of various network conditions, eventually converting the NN policy into a symbolic tree. The distilled symbolic rules preserve and often improve performance over state-of-the-art NN policies while being faster and simpler than a standard neural network. We validate the performance of our distilled symbolic rules on both simulation and emulation environments.


A Computational Method for Solving the Stochastic Joint Replenishment Problem in High Dimensions

Ata, Barış, van Eekelen, Wouter, Zhong, Yuan

arXiv.org Artificial Intelligence

We consider a discrete-time formulation for a class of high-dimensional stochastic joint replenishment problems. First, we approximate the problem by a continuous-time impulse control problem. Exploiting connections among the impulse control problem, backward stochastic differential equations (BSDEs) with jumps, and the stochastic target problem, we develop a novel, simulation-based computational method that relies on deep neural networks to solve the impulse control problem. Based on that solution, we propose an implementable inventory control policy for the original (discrete-time) stochastic joint replenishment problem, and test it against the best available benchmarks in a series of test problems. For the problems studied thus far, our method matches or beats the best benchmark we could find, and it is computationally feasible up to at least 50 dimensions -- that is, 50 stock-keeping units (SKUs).


Symbolic Distillation for Learned TCP Congestion Control

Neural Information Processing Systems

Recent advances in TCP congestion control (CC) have achieved tremendous success with deep reinforcement learning (RL) approaches, which use feedforward neural networks (NN) to learn complex environment conditions and make better decisions. However, such black-box'' policies lack interpretability and reliability, and often, they need to operate outside the traditional TCP datapath due to the use of complex NNs. This paper proposes a novel two-stage solution to achieve the best of both worlds: first to train a deep RL agent, then distill its (over-)parameterized NN policy into white-box, light-weight rules in the form of symbolic expressions that are much easier to understand and to implement in constrained environments. At the core of our proposal is a novel symbolic branching algorithm that enables the rule to be aware of the context in terms of various network conditions, eventually converting the NN policy into a symbolic tree. The distilled symbolic rules preserve and often improve performance over state-of-the-art NN policies while being faster and simpler than a standard neural network. We validate the performance of our distilled symbolic rules on both simulation and emulation environments.


Reviews: Towards Generalization and Simplicity in Continuous Control

Neural Information Processing Systems

The paper evaluates natural policy gradient algorithm with simple linear policies on a wide range of "challenging" problems from OpenAI MuJoco environment, and shows that these shallow policy networks can learn effective policies in most domains, sometimes faster than NN policies. It further explores learning robust and more global policies by modifying existing domains, e.g. The first part of the paper, while not proposing new approaches, offers interesting insights into the performance of linear policies, given plethora of prior work on applying NN policies as default on these problems. This part can be further strengthened by doing ablation study on the RL optimizer. Specifically, GAE, sigma vs alpha in Eq. 5, and small trajectory batch vs large trajectory batch (SGD vs batch opt).


Machine Learning-Based Automated Design Space Exploration for Autonomous Aerial Robots

Krishnan, Srivatsan, Wan, Zishen, Bharadwaj, Kshitij, Whatmough, Paul, Faust, Aleksandra, Neuman, Sabrina, Wei, Gu-Yeon, Brooks, David, Reddi, Vijay Janapa

arXiv.org Artificial Intelligence

Building domain-specific architectures for autonomous aerial robots is challenging due to a lack of systematic methodology for designing onboard compute. We introduce a novel performance model called the F-1 roofline to help architects understand how to build a balanced computing system for autonomous aerial robots considering both its cyber (sensor rate, compute performance) and physical components (body-dynamics) that affect the performance of the machine. We use F-1 to characterize commonly used learning-based autonomy algorithms with onboard platforms to demonstrate the need for cyber-physical co-design. To navigate the cyber-physical design space automatically, we subsequently introduce AutoPilot. This push-button framework automates the co-design of cyber-physical components for aerial robots from a high-level specification guided by the F-1 model. AutoPilot uses Bayesian optimization to automatically co-design the autonomy algorithm and hardware accelerator while considering various cyber-physical parameters to generate an optimal design under different task level complexities for different robots and sensor framerates. As a result, designs generated by AutoPilot, on average, lower mission time up to 2x over baseline approaches, conserving battery energy.